Gwangju
What if Readers Like A.I.-Generated Fiction?
Finally, he gave the summaries to his fine-tuned model, and he asked it to compose passages "in the style of Vauhini Vara." Going into all this, I was self-assured, even smug. I'd always felt that my style was original and, more important, that my books were totally distinct from one another. I figured that, even if the A.I. model could imitate my past books, it couldn't predict the style of the novel in progress. So, when Chakrabarty sent me the A.I.-generated imitations, I was genuinely confused.
- South America (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Michigan (0.04)
- (7 more...)
- Personal (1.00)
- Research Report > New Finding (0.46)
- Media > News (0.46)
- Education > Educational Setting > K-12 Education (0.46)
Advancing Autonomous Driving: DepthSense with Radar and Spatial Attention
Hussain, Muhamamd Ishfaq, Naz, Zubia, Rafique, Muhammad Aasim, Jeon, Moongu
Depth perception is crucial for spatial understanding and has traditionally been achieved through stereoscopic imaging. However, the precision of depth estimation using stereoscopic methods depends on the accurate calibration of binocular vision sensors. Monocular cameras, while more accessible, often suffer from reduced accuracy, especially under challenging imaging conditions. Optical sensors, too, face limitations in adverse environments, leading researchers to explore radar technology as a reliable alternative. Although radar provides coarse but accurate signals, its integration with fine-grained monocular camera data remains underexplored. In this research, we propose DepthSense, a novel radar-assisted monocular depth enhancement approach. DepthSense employs an encoder-decoder architecture, a Radar Residual Network, feature fusion with a spatial attention mechanism, and an ordinal regression layer to deliver precise depth estimations. We conducted extensive experiments on the nuScenes dataset to validate the effectiveness of DepthSense. Our methodology not only surpasses existing approaches in quantitative performance but also reduces parameter complexity and inference times. Our findings demonstrate that DepthSense represents a significant advancement over traditional stereo methods, offering a robust and efficient solution for depth estimation in autonomous driving. By leveraging the complementary strengths of radar and monocular camera data, DepthSense sets a new benchmark in the field, paving the way for more reliable and accurate spatial perception systems.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- Asia > South Korea > Gwangju > Gwangju (0.04)
- Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
- (8 more...)
- Transportation > Ground > Road (0.61)
- Information Technology > Robotics & Automation (0.61)
A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation
Kwon, Eunju, Oh, Seungwon, Baek, In-Chang, Park, Yucheon, Kim, Gyungbo, Moon, JaeYoung, Choi, Yunho, Kim, Kyung-Joong
Abstract--Contact-rich manipulation has become increasingly important in robot learning. However, previous studies on robot learning datasets have focused on rigid objects and underrepre-sented the diversity of pressure conditions for real-world manipulation. T o address this gap, we present a humanoid visual-tactile-action dataset designed for manipulating deformable soft objects. The dataset was collected via teleoperation using a humanoid robot equipped with dexterous hands, capturing multi-modal interactions under varying pressure conditions. Contact-rich interaction represents a critical gateway for enabling robots to perform complex tasks in real-world environments, yet it remains one of the fundamental challenges in robotic manipulation [1].
Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning
Naz, Zubia, Asghar, Farhan, Hussain, Muhammad Ishfaq, Hadadi, Yahya, Rafique, Muhammad Aasim, Choi, Wookjin, Jeon, Moongu
Automated medical image captioning translates complex radiological images into diagnostic narratives that can support reporting workflows. We present a Swin-BART encoder-decoder system with a lightweight regional attention module that amplifies diagnostically salient regions before cross-attention. Trained and evaluated on ROCO, our model achieves state-of-the-art semantic fidelity while remaining compact and interpretable. We report results as mean$\pm$std over three seeds and include $95\%$ confidence intervals. Compared with baselines, our approach improves ROUGE (proposed 0.603, ResNet-CNN 0.356, BLIP2-OPT 0.255) and BERTScore (proposed 0.807, BLIP2-OPT 0.645, ResNet-CNN 0.623), with competitive BLEU, CIDEr, and METEOR. We further provide ablations (regional attention on/off and token-count sweep), per-modality analysis (CT/MRI/X-ray), paired significance tests, and qualitative heatmaps that visualize the regions driving each description. Decoding uses beam search (beam size $=4$), length penalty $=1.1$, $no\_repeat\_ngram\_size$ $=3$, and max length $=128$. The proposed design yields accurate, clinically phrased captions and transparent regional attributions, supporting safe research use with a human in the loop.
- Asia > South Korea > Gwangju > Gwangju (0.05)
- Asia > Middle East > Saudi Arabia > Eastern Province > Al-Ahsa Governorate > Al-Hofuf (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Deformable Dynamic Convolution for Accurate yet Efficient Spatio-Temporal Traffic Prediction
Jin, Hyeonseok, Kim, Geonmin, Kim, Kyungbaek
Traffic prediction is a critical component of intelligent transportation systems, enabling applications such as congestion mitigation and accident risk prediction. While recent research has explored both graph-based and grid-based approaches, key limitations remain. Graph-based methods effectively capture non-Euclidean spatial structures but often incur high computational overhead, limiting their practicality in large-scale systems. In contrast, grid-based methods, which primarily leverage Convolutional Neural Networks (CNNs), offer greater computational efficiency but struggle to model irregular spatial patterns due to the fixed shape of their filters. Moreover, both approaches often fail to account for inherent spatio-temporal heterogeneity, as they typically apply a shared set of parameters across diverse regions and time periods. To address these challenges, we propose the Deformable Dynamic Convolutional Network (DDCN), a novel CNN-based architecture that integrates both deformable and dynamic convolution operations. The deformable layer introduces learnable offsets to create flexible receptive fields that better align with spatial irregularities, while the dynamic layer generates region-specific filters, allowing the model to adapt to varying spatio-temporal traffic patterns. By combining these two components, DDCN effectively captures both non-Euclidean spatial structures and spatio-temporal heterogeneity. Extensive experiments on four real-world traffic datasets demonstrate that DDCN achieves competitive predictive performance while significantly reducing computational costs, underscoring its potential for large-scale and real-time deployment.
- Asia > South Korea > Gwangju > Gwangju (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Transportation > Infrastructure & Services (0.88)
- Transportation > Ground > Road (0.68)
A Dataset and Benchmark for Robotic Cloth Unfolding Grasp Selection: The ICRA 2024 Cloth Competition
De Gusseme, Victor-Louis, Lips, Thomas, Proesmans, Remko, Hietala, Julius, Lee, Giwan, Choi, Jiyoung, Choi, Jeongil, Kim, Geon, Yonrith, Phayuth, Tabernik, Domen, Gams, Andrej, Nimac, Peter, Urbas, Matej, Muhovič, Jon, Skočaj, Danijel, Mavsar, Matija, Yu, Hyojeong, Kwon, Minseo, Kim, Young J., Cong, Yang, Chen, Ronghan, Ren, Yu, Diao, Supeng, Weng, Jiawei, Liu, Jiayue, Sun, Haoran, Yang, Linhan, Zhang, Zeqing, Guo, Ning, Yang, Lei, Wan, Fang, Song, Chaoyang, Pan, Jia, Jin, Yixiang, A, Yong, Shi, Jun, Li, Dingzhe, Yang, Yong, Yamasaki, Kakeru, Kajiwara, Takumi, Nakadera, Yuki, Saxena, Krati, Shibata, Tomohiro, Xia, Chongkun, Mo, Kai, Yu, Yanzhao, Lin, Qihao, Ma, Binqiang, Sagong, Uihun, Choi, JungHyun, Park, JeongHyun, Lee, Dongwoo, Kim, Yeongmin, Hwang, Myun Joong, Kuribayashi, Yusuke, Hiratsuka, Naoki, Tanaka, Daisuke, Arnold, Solvi, Yamazaki, Kimitoshi, Mateo-Agullo, Carlos, Verleysen, Andreas, Wyffels, Francis
Robotic cloth manipulation suffers from a lack of standardized benchmarks and shared datasets for evaluating and comparing different approaches. To address this, we created a benchmark and organized the ICRA 2024 Cloth Competition, a unique head-to-head evaluation focused on grasp pose selection for in-air robotic cloth unfolding. Eleven diverse teams participated in the competition, utilizing our publicly released dataset of real-world robotic cloth unfolding attempts and a variety of methods to design their unfolding approaches. Afterwards, we also expanded our dataset with 176 competition evaluation trials, resulting in a dataset of 679 unfolding demonstrations across 34 garments. Analysis of the competition results revealed insights about the trade-off between grasp success and coverage, the surprisingly strong achievements of hand-engineered methods and a significant discrepancy between competition performance and prior work, underscoring the importance of independent, out-of-the-lab evaluation in robotic cloth manipulation. The associated dataset is a valuable resource for developing and evaluating grasp selection methods, particularly for learning-based approaches. We hope that our benchmark, dataset and competition results can serve as a foundation for future benchmarks and drive further progress in data-driven robotic cloth manipulation. The dataset and benchmarking code are available at https://airo.ugent.be/cloth_competition.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > China > Hong Kong (0.05)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- (13 more...)
- Research Report (1.00)
- Overview (0.93)
Recovering Plasticity of Neural Networks via Soft Weight Rescaling
Oh, Seungwon, Park, Sangyeon, Han, Isaac, Kim, Kyung-Joong
Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual learning, and single-task learning setups on standard image classification benchmarks. Recent works have revealed that a neural network loses its ability to learn new data as training progresses, a phenomenon known as plasticity loss. A pre-trained neural network shows inferior performance compared to a newly initialized model when trained on the same data (Ash & Adams, 2020; Berariu et al., 2021). Lyle et al. (2024b) demonstrated that unbounded weight growth is one of the main causes of plasticity loss and suggested weight decay and layer normalization as solutions.
- Asia > South Korea > Gwangju > Gwangju (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs
Parida, Bikram Keshari, Sunilkumar, Anusree P., Sen, Abhijit, You, Wonsang
Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) providing 2D oral cavity representations, and Cone-Beam Computed Tomography (CBCT) offering detailed 3D anatomical information. While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility. Existing reconstruction models further complicate the process by requiring CBCT flattening or prior dental arch information, often unavailable clinically. We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX. Our key innovations include: (1) enhancing the NeBLa framework with Vision Transformers for improved reconstruction capabilities without requiring CBCT flattening or prior dental arch information, (2) implementing a novel horseshoe-shaped point sampling strategy with non-intersecting rays that eliminates intermediate density aggregation required by existing models due to intersecting rays, reducing sampling point computations by $52 \%$, (3) replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior global and local feature extraction, and (4) implementing learnable hash positional encoding for better higher-dimensional representation of 3D sample points compared to existing Fourier-based dense positional encoding. Experiments demonstrate that ViT-NeBLa significantly outperforms prior state-of-the-art methods both quantitatively and qualitatively, offering a cost-effective, radiation-efficient alternative for enhanced dental diagnostics.
- Europe > Switzerland (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting
Nguyen, Minh-Duc, Yang, Hyung-Jeong, Kim, Soo-Hyung, Shin, Ji-Eun, Kim, Seung-Won
The dyadic reaction generation task involves synthesizing responsive facial reactions that align closely with the behaviors of a conversational partner, enhancing the naturalness and effectiveness of human-like interaction simulations. This paper introduces a novel approach, the Latent Behavior Diffusion Model, comprising a context-aware autoencoder and a diffusion-based conditional generator that addresses the challenge of generating diverse and contextually relevant facial reactions from input speaker behaviors. The autoencoder compresses high-dimensional input features, capturing dynamic patterns in listener reactions while condensing complex input data into a concise latent representation, facilitating more expressive and contextually appropriate reaction synthesis. The diffusion-based conditional generator operates on the latent space generated by the autoencoder to predict realistic facial reactions in a non-autoregressive manner. This approach allows for generating diverse facial reactions that reflect subtle variations in conversational cues and emotional states. Experimental results demonstrate the effectiveness of our approach in achieving superior performance in dyadic reaction synthesis tasks compared to existing methods.
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.34)
Challenges and Trends in Egocentric Vision: A Survey
Li, Xiang, Qiu, Heqian, Wang, Lanxiao, Zhang, Hanwen, Qi, Chenghao, Han, Linfeng, Xiong, Huiyu, Li, Hongliang
With the rapid development of artificial intelligence technologies and wearable devices, egocentric vision understanding has emerged as a new and challenging research direction, gradually attracting widespread attention from both academia and industry. Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body, offering a unique perspective that simulates human visual experiences. This paper provides a comprehensive survey of the research on egocentric vision understanding, systematically analyzing the components of egocentric scenes and categorizing the tasks into four main areas: subject understanding, object understanding, environment understanding, and hybrid understanding. We explore in detail the sub-tasks within each category. We also summarize the main challenges and trends currently existing in the field. Furthermore, this paper presents an overview of high-quality egocentric vision datasets, offering valuable resources for future research. By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence, and propose future research directions based on the latest developments in the field.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- (13 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.45)
- Instructional Material > Course Syllabus & Notes (0.45)
- Leisure & Entertainment (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area (0.67)
- (3 more...)